Unsupervised Bottleneck Features for Low-Resource Query-by-Example Spoken Term Detection
نویسندگان
چکیده
We propose a framework which ports Dirichlet Gaussian mixture model (DPGMM) based labels to deep neural network (DNN). The DNN is used to extract a low-dimensional unsupervised speech representation, named as unsupervised bottleneck feature (uBNF), which captures considerable information for sound cluster discrimination. We investigate the performance of uBNF in query-by-example spoken term detection (QbE-STD) on the TIMIT English speech corpus. Our uBNF performs comparably with the cross-lingual bottleneck features (M-BNF) extracted from a DNN trained using 171 hours of transcribed telephone speech in another language (Mandarin Chinese). With the score fusion of uBNF and M-BNF, we gain about 10% relative improvement in term of mean average precision (MAP) comparing with M-BNF. We also studied the performance of the framework with different input features and different lengths of temporal context.
منابع مشابه
Unsupervised speech processing with applications to query-by-example spoken term detection
This thesis is motivated by the challenge of searching and extracting useful information from speech data in a completely unsupervised setting. In many real world speech processing problems, obtaining annotated data is not cost and time effective. We therefore ask how much can we learn from speech data without any transcription. To address this question, in this thesis, we chose the query-by-ex...
متن کاملIIIT-H SWS 2013: Gaussian Posteriorgrams of Bottle-Neck Features for Query-by-Example Spoken Term Detection
This paper describes the experiments conducted for spoken web search (SWS) at MediaEval 2013 evaluations. A conventional approach is to train a multi-layer perceptron using high resource languages and then use it in the low resource scenario. However, phone posteriorgrams have been found to under-perform when the language they were trained on differs from the target language. In this paper, we ...
متن کاملQuery-by-example spoken term detection evaluation on low-resource languages
As part of the MediaEval 2013 benchmark evaluation campaign, the objective of the Spoken Web Search (SWS) task was to perform Query-by-Example Spoken Term Detection (QbE-STD), using spoken queries to retrieve matching segments in a set of audio files. As in previous editions, the SWS 2013 evaluation focused on the development of technology specifically designed to perform speech search in a low...
متن کاملQuery-by-example Spoken Term Detection using Attention-based Multi-hop Networks
Retrieving spoken content with spoken queries, or query-byexample spoken term detection (STD), is attractive because it makes possible the matching of signals directly on the acoustic level without transcribing them into text. Here, we propose an end-to-end query-by-example STD model based on an attention-based multi-hop network, whose input is a spoken query and an audio segment containing sev...
متن کاملUnsupervised Hidden Markov Modeling of Spoken Queries for Spoken Term Detection without Speech Recognition
We propose an unsupervised technique to model the spoken query using hidden Markov model (HMM) for spoken term detection without speech recognition. By unsupervised segmentation, clustering and training, a set of HMMs, referred to as acoustic segment HMMs (ASHMMs), is generated from the spoken archive to model the signal variations and frame trajectories. An unsupervised technique is also desig...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016